The genome sequence of a heart cockle, Fragum fragum (Linnaeus, 1758)

We present a genome assembly from an individual specimen of Fragum fragum (a heart cockle; Mollusca; Bivalvia; Veneroida; Cardiidae). The genome sequence is 1,153.1 megabases in span. Most of the assembly is scaffolded into 19 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 22.36 kilobases in length. Gene annotation of this assembly on Ensembl identified 17,262 protein coding genes.


Background
The marine bivalve subfamily Fraginae (heart cockles) includes more than 50 described species found in temperate and tropical waters worldwide (Kirkendale, 2009).It contains one non-symbiotic clade and one symbiotic clade (Li et al., 2020), in which species maintain a photosymbiotic relationship with dinoflagellate symbionts belonging to the family Symbiodiniaceae (Kirkendale & Paulay, 2017).This provides a perfect opportunity to conduct comparative studies to reveal the origin and molecular mechanism of animal photosymbiosis.Fragum fragum is one of the most broadly distributed species in its genus, ranging from the eastern African coast to the Tuamotu Archipelago (Kirkendale et al., 2021).It is semi-epifaunal and dwells in sandy sediments.It is known to be associated with symbionts from the genus Cladocopium (Li et al., 2018).F. fragum shells exhibit unique morphological adaptations to photosymbiosis, including a large surface-to-volume ratio, flattened posterior side, and the presence of transparent shell microstructures (windows) (Kirkendale, 2009).Studying the whole genome assembly of F. fragum has the potential to address important questions regarding the molecular mechanism, evolution, and adaptation of photosymbiosis.For example, what genetic innovations allow them to host symbionts in a specialised tubular system?What is the molecular mechanism underlying the shell window structure, which allows them to meet symbionts' light requirement?Does F. fragum share similar molecular mechanisms of photosymbiosis with other photosymbiotic organisms, such as corals?This highquality genome lays the foundation for creating a new model system to investigate photosymbiosis in Metazoa.

Genome sequence report
The genome was sequenced from a specimen of Fragum fragum (Figure 1) collected from East Hagatna Bay Beach, Guam, USA (13.49,144.77).A total of 14-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 98 missing joins or mis-joins and removed 99 haplotypic duplications, reducing the assembly length by 4.02% and the scaffold number by 68.80%, and increasing the scaffold N50 by 0.31%.
The final assembly has a total length of 1153.1 Mb in 38 sequence scaffolds with a scaffold N50 of 63.9 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.96%) of the assembly sequence was assigned to 19 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
The Fragum fragum specimens used for DNA sequencing (specimen ID NSU0013502, ToLID xbFraFrag2) and Hi-C and RNA sequencing (specimen ID NSU0013501, ToLID xbF-raFrag1) were collected from East Hagatna Bay in front of Alupang Island, Guam, USA (latitude 13.49, longitude 144.77) on 2021-06-17, through a process of sifting sand with 1 mm metal sifters while snorkelling.The specimens were collected by Sarah Lemer (University of Guam Marine Lab) and identified by Ruiqi Li (University of Colorado, Boulder) and preserved by flash-freezing in liquid nitrogen.
The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) includes a sequence of core procedures: sample preparation; sample homogenisation,     A Hi-C map for the final assembly was produced using bwa-mem2 (Vasimuddin et al., 2019) in the Cooler file format (Abdennur & Mirny, 2020).To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated in Merqury (Rhie et al., 2020)

Genome annotation
The Ensembl Genebuild annotation system at the EBI (Aken et al., 2016) was used to generate annotation for the Fragum fragum assembly (GCA_946902895.1).Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Tree of Life collaborator.The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material • Legality of collection, transfer and use (national and international) Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators.

Masa-Aki Yoshida
Shimane University, Matsue, Japan Li et al. present the genome of the bivalve Fragum fragum.The assembled genome could be a valuable genetic resource for their unique adaptations to photosymbiosis.However, as other reviewers have noted, a couple of concerns should be addressed prior to indexing.
The low score for the BUSCO analysis is problematic; mollusca_odb10 may not be appropriate for this species, and the BUSCO analysis for the entire reference set of metazoan_odb should also be re-run with closely related species and included in the text.
Another point is that the possibility of contamination has not been adequately verified.The possibility of contamination of the genome of Cladocopium symbionts using the Fragum scaffold should be verified and stated with a long read mapper such as minimap.

Is the rationale for creating the dataset(s) clearly described? Partly
Are the protocols appropriate and is the work technically sound?Partly

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Comparatice genomics, Evolutionary biology
Li et al. present the genome sequencing of Fragum fragum, a marine bivalve notable for its photosymbiotic relationships.This work is a valuable contribution to marine biology and genomics, thus underscoring the genome's quality.
Although the assembly has a high N50 of 63.9 Mb, the BUSCO completeness is only 78.6%, which is relatively low and might suggest issues with assembly or annotation completeness.Moreover, the coverage of HiFi reads is only 14-fold, falling short of the typical 30-fold standard for highquality genomes.Furthermore, the manuscript lacks detailed information on Hi-C read coverage, and reported Hi-C interactions within each chromosome appear week.The authors are encouraged to re-evaluate the gene annotation and validate the chromosomal structure to enhance the credibility and utility of the assembly.

Charles Plessy
Okinawa Institute of Science and Technology, Okinawa, Japan This reports follows the Tree of Life project's standard pattern, which makes it easy to review.I examined the contact map and the assembly statistics and found no obvious defect.
The BUSCO score is below 80%, which raises the possibility that the assembly is not complete.However, publicly available BUSCO scores for the Cardiida order all show a very similar result, which is reassuring.This should be noted in the manuscript as not every reader will be able to find this information online (https://blobtoolkit.genomehubs.org/view/Cardiida#Datasets).
Is the rationale for creating the dataset(s) clearly described?The assembled genome demonstrates high-quality continuity; however, its completeness appears to be somewhat limited, with a BUSCO completeness score of 78.6%.This is relatively low compared to similar genomes produced using the same methodology and falls below the desired standards.Perhaps authors should re-run BUSCO against the full metazoan dataset as well.
Nonetheless, this genome represents a valuable addition to the databases for an underrepresented species and deserves recognition in this manuscript.
An important aspect to highlight is that Fragum fragum belongs to a group prone to exhibiting transmissible tumors, making this genomic data particularly significant for future studies on disease mechanisms and prevention in bivalves.
On the lower side, the photograph of the shell provided in this manuscript might be insufficient for purely taxonomic purposes, as it does not highlight the hinge, which is typically the most informative taxonomic character for bivalves.However, the specimen is stored, and its identification is accurate, allowing specialized taxonomists to review it on-site if necessary.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Molecular cytogenetics, repetitive DNA, genomics, malacology I confirm that I have read this submission and believe that I have an appropriate level of

Figure 2 .
Figure 2. Genome assembly of Fragum fragum, xbFraFrag2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,153,092,446 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (88,961,846 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (63,927,008 and 49,258,289 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the mollusca_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMPPX01/dataset/CAMPPX01/snail.

Figure 3 .
Figure 3. Genome assembly of Fragum fragum, xbFraFrag2.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMPPX01/dataset/CAMPPX01/blob.

Figure 4 .
Figure 4. Genome assembly of Fragum fragum, xbFraFrag2.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CAMPPX01/dataset/CAMPPX01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Fragum fragum, xbFraFrag2.1:Hi-C contact map of the xbFraFrag2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=DacnP7w_QK2J95ibXtLPgA.
the rationale for creating the dataset(s) clearly described?Partly Are the protocols appropriate and is the work technically sound?Partly Are sufficient details of methods and materials provided to allow replication by others?Partly Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Mollusc genetics and genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Reviewer Report 09 July 2024 https://doi.org/10.21956/wellcomeopenres.23378.r87891© 2024 Plessy C.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

©
2024 Garcia-Souto D. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Daniel Garcia-Souto University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain Li et al. present the first comprehensive description of the genome of the bivalve Fragum fragum.

Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
https://doi.org/10.21956/wellcomeopenres.23378.r87888